Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a destination connector for nomicdb #175

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

catle2aurecon
Copy link

Description

While unstructured-ingest to process data, I would like to ingest data directly into nomic AtlasDaaset and visualise data with its Atlas map.

Key changes

1 test_e2e/python/test-ingest-nomicdb.py: a simple integrated demonstration of processing local files with unstructured api and ingesting into nomic map

2 unstructured_ingest/connector/nomicdb.py: use connector/qdrant.py as an example to implement this connector

Testing

  • Provide a simple launch.json to run test_e2e/python/test-ingest-nomicdb.py.

@catle2aurecon
Copy link
Author

Hi @potter-potter ,

Many thanks for your work at UnstructuredIO. This is my first contribution i.e. adding a nomicdb connector to Unstructured-Ingest. I hope you can give me a pointer or two to improve my Pull Request.

@@ -0,0 +1,14 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like and IDE specific file, shouldn't be in this PR.

@@ -0,0 +1,80 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we've moved to adding integration tests rather than e2e tests for new connectors. Take a look at this s3 example: test_s3.py. This should help isolate chunk the connector code and make testing it easier.

@@ -0,0 +1,125 @@
import multiprocessing as mp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new connectors, this should live in the v2 directory using the new ingest framework: unstructured_ingest/v2/processes/connectors

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the general approach you've introduced here so I'll wait to review the actual connector files until that's been updated.

@rbiseck3 rbiseck3 added the needs edits This PR has been reviewed and needs edits to be complete label Oct 17, 2024
@rbiseck3
Copy link
Collaborator

I've added the needs edits tag on this for now. Once it's ready for anther review, feel free to remove that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs edits This PR has been reviewed and needs edits to be complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants